Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ParameterizedType and Embed #3339

Conversation

TheFirstAvenger
Copy link
Contributor

@TheFirstAvenger TheFirstAvenger commented Jun 24, 2020

Edit: Per the conversation in this PR, this has evolved from ParameterizedType and Enum to ParameterizedType and Embed (with Enum coming later).

Original Description:

This is a follow up PR after the discussion in #3337 with @josevalim and @wojtekmach. The desire is to have :enum built on an underlying "parameterized types" foundation. This PR is a first attempt at implementing parameterized types, and using them to implement :enum.

Parameterized types are specified in the format {type, keyword_opts} like this:

  field "bar", {MyApp.MyType, opt1: :baz, opt2: :boo}

In the above example, MyApp.MyType is a module that uses (and implements the behaviour) Ecto.ParameterizedType. ParameterizedType is similar to Ecto.Type, except all functions take an additional keyword opt parameter. Additionally, an init/1 callback exists so the options can be validated and possibly transformed from what is specified in the field macro into what the other callback functions need to have access to in their new additional parameters.

Ecto.Type.Enum is also added as both an example of how to create a parameterized type, and the module that implements the new standard:enum parameterized type in Ecto. It has one parameter, values. To create an enum field, you specify:

  field "foo", {:enum, values: [:foo, :bar, :baz]}

Under the hood, Ecto translates this to:

  field "foo", {Ecto.Type.Enum, values: [:foo, :bar, :baz]}

and then :enum is handled just like any other ParameterizedType.

If this approach looks good, I can add appropriate testing and references to :enum and ParameterizedType in the greater Ecto docs.

@josevalim
Copy link
Member

Thanks @TheFirstAvenger, this is really promising!

The next step we have always discussed when introducing parameterized types is if we can implement embeds as parameterized types. This would be the ultimate proof that our parameterized types are flexible enough to allow any extensions to Ecto. What do you think about exploring this venue? Also, for now, let's drop the enum type from this PR, so we can validate it. Then adding enum after will be really easy!

@@ -2038,6 +2048,15 @@ defmodule Ecto.Schema do
end
end

defp parameterized_type?({type, opts}) when is_atom(type) and is_list(opts) do
Code.ensure_compiled(type) == {:module, type} &&
Copy link
Member

@josevalim josevalim Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more reliable check is probably to use Ecto.Type.primitive? to see if it is a primitive type. If it is not a primitive type (and it is a tuple), then it is a parameterized type. Something like:

  defp parameterized_type?({type, opts}) when is_atom(type) do
    not Ecto.Type.primitive_type?({type, opts})
  end

@TheFirstAvenger
Copy link
Contributor Author

TheFirstAvenger commented Jun 24, 2020

@josevalim Sounds like a plan! To make sure I am understanding the embed requirement as you intend it, do you mean that basically this would be happening (either explicitly or under the hood):

  schema "orders" do
    field :items, {Ecto.Type.EmbedMany, type: Item}
    field :reciept, {Ecto.Type.Embed, type: Reciept}
  end

and any other reference to embed or embed_many (including things like cast_embed) in the existing internal Ecto codebase would go away?

If yes, then I imagine there could be a similar : version like how I did with the :enum that translated to the full Module under the hood to indicate it is a core feature:

  schema "orders" do
    field :items, {:embed_many, type: Item}
    field :items, {:embed, type: Reciept}
  end

And the existing embeds and embeds_many macros would remain and expand under the hood to the top version above?

@josevalim
Copy link
Member

Exactly. We probably won't change our public API, but being able to do:

    field :items, {Ecto.Type.EmbedMany, type: Item}

would be great!

For example, you will notice right out the box that, for this, the type will have to receive the name of the field too. There are also places in Ecto.Changeset we will have to change to enhance the type/parameterized type API.

@TheFirstAvenger
Copy link
Contributor Author

TheFirstAvenger commented Jun 25, 2020

@josevalim What are your thoughts on cast_embed? Does it go away completely because you can simply use the regular cast function for the param, and the options like with, required, etc... move to the field definition parameters, and are triggered inside the Embed.cast/2 function? If I am thinking of this correctly, what do we do with backwards compatibility for cast_embed?

edit: required would actually move to validate_required

@josevalim
Copy link
Member

Let's support it in cast and keep cast_embed around for compatibility. :)

@TheFirstAvenger TheFirstAvenger force-pushed the mb-parameterized-types-and-enum branch from 35d809c to 69fe96b Compare July 1, 2020 20:06
@TheFirstAvenger TheFirstAvenger changed the title Add ParameterizedType and :enum Add ParameterizedType and Embed Jul 1, 2020
@TheFirstAvenger
Copy link
Contributor Author

@josevalim I have been iterating on this, I just pushed the latest version, with the basics of Embed added, and casting/loading/dumping working in many of the basic paths. There is obviously a ton more work to do with the various options and handling them correctly, adding EmbedMany, and handling the legacy functions for embeds, but if you have a second to take a look, I would appreciate any input on the approach.

@josevalim
Copy link
Member

Sweet. I will review it soon and drop some comments!

@@ -1193,6 +1219,11 @@ defmodule Ecto.Changeset do
%{changeset | changes: changes, errors: errors, valid?: valid?}
end

defp put_change(data, changes, errors, valid?, key, value, {:parameterized, Ecto.Type.Embed, opts}) do
relation = Ecto.Type.Embed.get_relation(opts)
put_change(data, changes, errors, valid?,key, value, {:embed, relation})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a new callback, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, part of the issue was that a lot of my iterations on this went down rabbit holes of supporting the existing put_* and cast_* relation functions, and I had tried it on a number of different levels inside the existing logic. This being my first deep dive into how Ecto works under the hood, there was a lot of TDD where I was simply trying to clean up the things that broke when it saw a :parameterized tuple. I finally stepped back from that so I could concentrate on the direct cast, load, and dump paths, which allowed me to make progress. I also shifted towards doing as much as possible in the Embed function itself, and not trying to "fix" the logic inside changeset surrounding the "relation" object. I think this is an instance of me finding a path that worked early on (by faking a relation object), and I need to go back and figure out if/when I should be jumping out to a callback in parameterzed type as opposed to simply massaging the data and hooking into the current put_change logic.

"value `#{inspect(value)}` for `#{inspect(schema)}.#{field}` " <>
"in `#{action}` does not match type #{type}"
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... is the issue here that the result of the parameterized dump still needs to be dumped? 🤔 I wonder if we should move this to a feature set of the parameterized type itself but I am not sure how. Either dump can return a special tuple that means it needs to be further dumped or we pass a callback to dump so it happens recursively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am approaching it wrong, or misunderstood the dump function, but as it stands, I modeled parameterized types following the pattern of the type callback on custom types, they still have an underlying "base" data structure such as a map or string. Parameterized types know how to get themselves into their own base format (e.g. Embed "dumps" to a :map), and then Ecto knows how to "dump" the base type. The reasoning on this would be that authors of parameterized types would only have to worry about dumping their data to a higher level type like map, and ecto can then handle the mechanics of taking it from a map to the underlying data structure. They could always specify a type of :any and return the base storable values, as that would basically make the second call in the with statement a noop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you said makes perfect sense. So I believe what needs to be "fixed" here is to move this logic to inside Ecto.Type, rather than here. :)

@@ -1517,7 +1518,8 @@ defmodule Ecto.Schema do
defmacro embeds_one(name, schema, opts, do: block) do
quote do
{schema, opts} = Ecto.Schema.__embeds_module__(__ENV__, unquote(schema), unquote(opts), unquote(Macro.escape(block)))
Ecto.Schema.__embeds_one__(__MODULE__, unquote(name), schema, opts)
opts = Keyword.put(opts, :type, unquote(schema))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this a feature of the parameterized types too (i.e. they all receive the current schema as an option).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 I'll inject that.

lib/ecto/type.ex Outdated

@typedoc "Primitive Ecto types (handled by Ecto)."
@type primitive :: base | composite

@typedoc "Custom types are represented by user-defined modules."
@type custom :: module

@typedoc "Parameterized custom types are represented by user-defined modules and options."
@type parameterized :: {module, keyword()} | {:parameterized, module, keyword()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of the double representation but I am not sure if we have other alternatives. I guess one option is to make the type a map or a struct? This way you can dispatch to it by calling type.__struct__.dump() and so on. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be a bit more precise, we would declare them for now as field :foo, %Ecto.Enum{values: [:foo, :bar]}. It is not the best API in the world but we can always add sugar later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The {module, keyword()} representation only appears in the field call, and Dialyzer was complaining in check_field_type! because primitive? takes a Type.t. I can catch and convert those before check_field_type! and remove the {module, keyword()} version from type.

edit: replied before I saw your second reply

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I understand what you mean: It would be best if the internally tracked "type" was simply the module that implements parameterized, with the opts stored inside them, instead of explicitly internally handling {:parameterized, type, opts}. This would be similar to how Custom types are handled, but would require determining if a module is custom or parameterized at a number of stages in the code.

Given what I have seen so far on this, I see the benefit of that approach but I still would lean towards maintaining the explicit :parameterized tuple internally because it makes branching on the logic that needs to be handled different much cleaner, with less guess work and less error prone. The lines between a custom type and a parameterized type would get blurry and we would have to pick some test to test if the type is a custom or parameterized type everywhere that we currently assume it is custom. Alternatively, with the tuple method, we know for a fact that the user intended the type to be parameterized from the start in the field call, so carrying that over in a :parameterized tuple to me feels like the safer way, but let me know which one you would like and I will adjust accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The {module, keyword()} representation only appears in the field call, and Dialyzer was complaining in check_field_type! because primitive? takes a Type.t. I can catch and convert those before check_field_type! and remove the {module, keyword()} version from type.

Yes, let's go this route then. Please ignore my second comment. :)

@josevalim
Copy link
Member

Fantastic job, this is exactly what I had in mind! ❤️

@@ -0,0 +1,72 @@
defmodule Ecto.ParameterizedType do
Copy link
Member

@wojtekmach wojtekmach Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always thought we'd implement parameterised types as part of Ecto.Type. That is, instead of the current:

use Ecto.Type

@impl true
def cast(term) do

folks would do:

use Ecto.Type

@impl true
def cast(term, params)

and if your type is not parameterised you'd just define cast/1 and we'd define cast/2 for you as part of use Ecto.Type.

But perhaps it's better not to try shoehorning it in. Definitely better start to keep it as separate thing as you have, we always have an option to merge the concepts down the road if that would be an improvement.

This is only tangentially related, but I always considered our composite types ({:array, _} and {:map, _}) as another example of parameterised types besides embeds and we do handle them in Ecto.Type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtekmach good to know, I will keep that in mind as I work on this. Yes, if we could pull it off, Parameterized Types being simply Ecto.Type but with opts would be nice. I would agree that at least for now, keeping it separate will allow us to concentrate on getting it done right, and we can merge down the road if possible.

And yes, I would love to eventually pull :array and :map out to parameterized, cleaning out a bunch of existing hardcoded code to handle them. Bonus points if we can figure out how to pull :assoc out to parameterized 😁.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we cannot use Ecto.Type because of the "param" argument, so Ecto.ParameterizedType is the way to go IMO. If anything, we would make Ecto.Type become Ecto.ParameterizedType in the future (but most likely never).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, I would love to eventually pull :array and :map out to parameterized, cleaning out a bunch of existing hardcoded code to handle them. Bonus points if we can figure out how to pull :assoc out to parameterized 😁.

This is actually more complex because the argument given to :array and :map is actually another type itself. To support them in ParameterizedType, then we would definitely need to pass a dumper function to dump, loader function to load and so on, because all of them are somewhat recursive with context. Maybe we should do that anyway or maybe embeds themselves already require this.

@TheFirstAvenger
Copy link
Contributor Author

@josevalim There are a lot of tests that use Relation. I am still learning the inner workings of Ecto. Is the Relation module something that we need going forward? It seems to be only used in functions that are going to be rewritten in this PR, and is @moduledoc false, so not a public API. Is it ok to remove this and the associated tests (and any needed logic in it move to Embed), or am I missing something?

@josevalim
Copy link
Member

@TheFirstAvenger it is completely fine to remove it, yes. The unit tests are for convenience, we can move the unit tests anywhere else you want.

Keep in mind though the goal of the Relation module is to share common functionality between associations and embeds. So you may want to keep it with this purpose still. Up to you.

@TheFirstAvenger
Copy link
Contributor Author

@josevalim @wojtekmach just an update: I have been progressing slowly through this. The two big hurdles still left that I can see are:

  1. reconciling changes/casts for list in EmbedMany (it currently only works on same-length lists and naively compares each item in the list in order).
  2. Embedded ids

As you can see I have added @tag :skip # reason to each of the currently failing tests with a note on each one as to what I think might need to be fixed to get it working. If you happen to have a chance to re-review the progress so far, any thtoughts would be appreciated.

@Eiji7
Copy link
Contributor

Eiji7 commented Jul 28, 2020

@TheFirstAvenger For me it looks like this one:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/test/ecto/repo/embedded_test.exs#L98

fails because this function needs to be updated:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/lib/ecto/repo/schema.ex#L702

as after changes in the code this clause is incorrectly matched:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/lib/ecto/repo/schema.ex#L727-L728
where field is :embed and value is 1.

Anyway no matter if value is correct you are always putting it as a change. Therefore even if we would not have an exception then this line:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/test/ecto/repo/embedded_test.exs#L101
would never match as you are never putting any error on changeset for embed.

@TheFirstAvenger
Copy link
Contributor Author

@Eiji7 Nice catch, just pushed a fix, thanks!

@Eiji7
Copy link
Contributor

Eiji7 commented Jul 29, 2020

@TheFirstAvenger This test:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/test/ecto/query/planner_test.exs#L363-L366
is failing, because you have changed schema.

57100494 (now 67865558) is a hash of schema. It's generated here:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/lib/ecto/schema.ex#L549
as you can see hashing algorithm uses for this primary key fields and query fields.

The hash in planner is added here:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/lib/ecto/query/planner.ex#L723-L724

I'm not really sure if we are interested in checking hash at least in this specific test. For now you can easily change hash and later @josevalim may decide whether he want to pattern match on it in this test.

@Eiji7
Copy link
Contributor

Eiji7 commented Jul 29, 2020

@TheFirstAvenger Here:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/test/ecto/query/planner_test.exs#L843-855

planner is failing on prewalk for json like we used in test p.metas[0]["slug"]. The problem is invalid type here:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/lib/ecto/query/planner.ex#L1093-L1096
as it returns :list which is going to last clause containing raise:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/lib/ecto/query/planner.ex#L1110-L1111

Therefore you need to change code like that:

    # …
    case get_source!(kind, query, ix) do
      {_, nil, _} ->
        :ok

      {_, {:parameterized, module, opts}, _} ->
        validate_json_path!(path, field, module.type(opts))

      {_, :map, _} -> :ok

      {_, {:map, _}, _} -> :ok

      {_, nil, _} ->
        raise "field `#{field}` does not exist in #{inspect(schema)}"

      {_, other, _} ->
        raise "expected field `#{field}` to be an embed or a map, got: `#{inspect(other)}`"
    end
    # …

and after that you need to also rewrite validate_json_path!/3 function which is located here:
https://github.com/TheFirstAvenger/ecto/blob/d103ebec47387b8945f16b9fbaaf70fc53b38693/lib/ecto/query/planner.ex#L979-L999

@TheFirstAvenger
Copy link
Contributor Author

@Eiji7 Thanks! I updated the hash and took a shot at implementing validate_json_logic in a generic way for Parameterized Types.

@Eiji7
Copy link
Contributor

Eiji7 commented Jul 31, 2020

@TheFirstAvenger Here:
https://github.com/TheFirstAvenger/ecto/blob/mb-parameterized-types-and-enum/test/ecto/changeset_test.exs#L165
you have lot of things (related to Changeset) still to do …

First of all you need to remove @relations attribute:
https://github.com/TheFirstAvenger/ecto/blob/mb-parameterized-types-and-enum/lib/ecto/changeset.ex#L306
and after it for all occurrences of using this attribute you need to ensure:

  1. Each when is rewritten to pattern-matching of :assoc atom
  2. Ensure that after modify of 1st point {:parameterized, _, _} as type is supported

Regarding this specific test you need to especially:

  1. Rewrite this: https://github.com/TheFirstAvenger/ecto/blob/mb-parameterized-types-and-enum/test/ecto/changeset_test.exs#L173-L184 to: {:parameterized, Ecto.Type.Embed, %{type: SocialSource}}
  2. Add source to cast (as same as title): https://github.com/TheFirstAvenger/ecto/blob/mb-parameterized-types-and-enum/test/ecto/changeset_test.exs#L192
  3. Remove cast_embed call: https://github.com/TheFirstAvenger/ecto/blob/mb-parameterized-types-and-enum/test/ecto/changeset_test.exs#L193

After that you would need also to rewrite case in Ecto.Changeset.apply_changes/1: https://github.com/TheFirstAvenger/ecto/blob/mb-parameterized-types-and-enum/lib/ecto/changeset.ex#L1531-L1538 by:

  1. Rewriting {:ok, {tag, relation}} when tag in @relations to {:ok, {:assoc relation}}
  2. Adding extra clause for {:parameterized, module, opts}

btw. Looks like this PR is taking 2 months. Would you like to team with me let's say on Slack or maybe let me finish it in reasonable time? I don't want to rush you, but within "5 min" I got few interesting ideas for this new feature and I believe that not only me is waiting for it.

@TheFirstAvenger
Copy link
Contributor Author

@Eiji7 Yes, sorry, I started a work project right as I submitted the first iteration of this so I haven't had time to dedicate to it, however staring today I have a block of time I can dedicate. I would appreciate any assistance you could provide. I have sent you a message in the Elixir Slack.

@Eiji7
Copy link
Contributor

Eiji7 commented Jul 31, 2020

@josevalim I have a good news! I've teamed up with @TheFirstAvenger on Slack and we will work on this PR faster now.

The plan for now is to:

  1. Firstly I would give hints for those test files:
    test/ecto/changeset_test.exs:  @tag :skip # Need to revert Changeset IO inspect changes before merging
    test/ecto/repo_test.exs:    @tag :skip # Need to figure out adapter stuff
    test/ecto/type_test.exs:    @tag :skip # Need to fix this
    test/ecto/type_test.exs:    @tag :skip # Need to fix this
    
    @TheFirstAvenger should have enough information for 3 of 4 tests and I would debug last one yet today. 🙂
  2. After that we would have only 3 files with failing tests:
    test/ecto/changeset/embedded_test.exs (21 skipped)
    test/ecto/changeset/embedded_no_pk_test.exs (15 skipped)
    test/ecto/repo/embedded_test.exs (15 skipped)
    
    We will work on each file separately as many of tests inside would probably be related and therefore one fix would make few tests pass.

I would like to know:

  1. If it was ok to remove tests with such comments:

    @tag :skip # I believe this goes away
    @tag :skip # I believe this should be deleted

  2. Can we edit and move changes from Generate IDs for nested embeds #3360 here? Otherwise we would have a conflict.
  3. Of course after fixing skipped tests is there anything this PR should do?

Assuming that we only need to fix skipped tests and @TheFirstAvenger would have enough time then we can finish this PR even in next week!

@josevalim
Copy link
Member

Awesome!!!

  1. Let's not remove tests for now, you can skip them and I can take a look at them and we remove them in a separate PR.

  2. Let's not do that. Let's do it after this is merged. Easier to track things.

  3. I will be sure only after I do a thorough review. I assume we will have to do some changes to keep compatibility but it should be relatively straight-forward. 👍

@TheFirstAvenger
Copy link
Contributor Author

I just pushed a first pass at reconciling lists of items in EmbedMany. It also consolidates cast/change into cast_or_change similar to how Relation does it. It still needs work, but it is a start.

Copy link
Contributor

@Eiji7 Eiji7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like:

  1. Some changes should be reverted (should not be part of this PR or creates a breaking change)
  2. Some pattern-matching for {:embed, _} is added, but I do not see an equivalent for {:parameterized, type, opts} match
  3. Some {:parameterized, type, opts} were added without removing other code - looks like workaround or those pattern should be moved somewhere else

What do you think about it?

Comment on lines +605 to +653
# @doc """
# Loads previously dumped `data` in the given `format` into a schema.

# The first argument can be a an embedded schema module, or a map (of types) and
# determines the return value: a struct or a map, respectively.

# The second argument `data` specifies fields and values that are to be loaded.
# It can be a map, a keyword list, or a `{fields, values}` tuple. Fields can be
# atoms or strings.

# The third argument `format` is the format the data has been dumped as. For
# example, databases may dump embedded to `:json`, this function allows such
# dumped data to be put back into the schemas.

# Fields that are not present in the schema (or `types` map) are ignored.
# If any of the values has invalid type, an error is raised.

# Note that if you want to load data into a non-embedded schema that was
# directly persisted into a given repository, then use `c:Ecto.Repo.load/2`.

# ## Examples

# iex> result = Ecto.Adapters.SQL.query!(MyRepo, "SELECT users.settings FROM users", [])
# iex> Enum.map(result.rows, fn [settings] -> Ecto.embedded_load(Setting, Jason.decode!(settings), :json) end)
# [%Setting{...}, ...]
# """
# @spec embedded_load(
# module_or_map :: module | map(),
# data :: map(),
# format :: atom()
# ) :: Ecto.Schema.t() | map()
# def embedded_load(schema_or_types, data, format) do
# Ecto.Schema.Loader.unsafe_load(schema_or_types, data, &Ecto.Type.embedded_load(&1, &2, format))
# end

# @doc """
# Dumps the given struct defined by an embedded schema.

# This converts the given embedded schema to a map to be serialized
# with the given format. For example:

# iex> Ecto.embedded_dump(%Post{}, :json)
# %{title: "hello"}

# """
# @spec embedded_dump(Ecto.Schema.t(), format :: atom()) :: map()
# def embedded_dump(%schema{} = data, format) do
# Ecto.Schema.Loader.safe_dump(data, schema.__schema__(:dump), &Ecto.Type.embedded_dump(&1, &2, format))
# end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a breaking change … We can add @deprecated and support it as long as we do not hard deprecated it, right?

Comment on lines 783 to 800
# # raise "cast_embed is no longer necessary, instead add the embedded type to the `cast` function"
# # raise inspect opts
# %{data: data, types: types, params: params, changes: changes, errors: errors} = changeset
# empty_values = [nil]
# {changes, errors, valid?} =
# process_param(name, params, types, data, empty_values, %{}, {changes, errors, changeset.valid?})

# {:parameterized, embed_type, _embed_opts} = type!(types, name)

# if embed_type not in [Ecto.Type.Embed, Ecto.Type.EmbedMany], do: raise "bad embed type"

# if opts[:with] do
# raise "with: is no longer supported in cast_embed. Please specify `with` in the schema field definition"
# end

# %Changeset{params: params, data: data, valid?: valid?,
# errors: Enum.reverse(errors), changes: changes,
# types: types, empty_values: empty_values}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like those comments are not necessary, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an early pass that I wasn't ready to delete yet. I have removed it.

@@ -767,8 +777,27 @@ defmodule Ecto.Changeset do
using `with: {Author, :special_changeset, ["hello"]}` will be invoked as
`Author.special_changeset(changeset, params, "hello")`
"""
#@deprecated "use cast instead"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to comment it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It throws warnings in tests if it is tagged as deprecated. I haven't yet determined the right strategy to test both the legacy method and the new method for all tests, so I still use the cast_embed in tests for now.

Copy link
Contributor

@Eiji7 Eiji7 Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it - that's compile time warning and there is really simple way to avoid such warnings:

# in test body instead of:
My.deprecated(:call)
# use:
module = My
# module here is runtime variable!
module.deprecated(:call)

I remember this perfectly as I had exactly same problem with Enum.chunk/3 and Enum.chunk/4:
https://github.com/elixir-lang/elixir/blob/master/lib/elixir/test/elixir/enum_test.exs#L50-L53

Next time feel free to ask. Even if issue/pr is not best for it you still have me and others on Slack and forum. 🙂

Comment on lines +3017 to +3018
# defimpl Inspect, for: Ecto.Changeset do
# import Inspect.Algebra

# def inspect(changeset, opts) do
# list = for attr <- [:action, :changes, :errors, :data, :valid?] do
# {attr, Map.get(changeset, attr)}
# end

# container_doc("#Ecto.Changeset<", list, ">", opts, fn
# {:action, action}, opts -> concat("action: ", to_doc(action, opts))
# {:changes, changes}, opts -> concat("changes: ", to_doc(changes, opts))
# {:data, data}, _opts -> concat("data: ", to_struct(data, opts))
# {:errors, errors}, opts -> concat("errors: ", to_doc(errors, opts))
# {:valid?, valid?}, opts -> concat("valid?: ", to_doc(valid?, opts))
# end)
# end

# defp to_struct(%{__struct__: struct}, _opts), do: "#" <> Kernel.inspect(struct) <> "<>"
# defp to_struct(other, opts), do: to_doc(other, opts)
# end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be before merging. The inspect protocol override hides valuable data in the changeset which makes it harder to debug issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheFirstAvenger Wait, who said that inspect protocol needs to be good for debugging? For this just write code like:

changeset |> Map.from_struct() |> IO.inspect(label: "All map keys and values from Ecto.Changeset struct")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I work through this, I add and remove dozens of IO.inspects an hour, not always knowing which variable might be a changeset and which might not (in many cases it could be either). It is helpful for me during development to have this commented out, but as mentioned I will remove it when we are closer to being ready to merge.

Comment on lines 52 to 54
def apply_changes({:parameterized, Ecto.Type.Embed, _opts}, _changeset) do
raise "shouldn't happen anymore"
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed. It's unexpected behaviour. If needed for now please add proper TODO comment.

Comment on lines -571 to -573
def load({:embed, embed}, value, loader) do
load_embed(embed, value, loader)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not we have an equivalent for {:parameterized, type, opts} match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is handled by this line which uses this

Comment on lines -898 to -900
def adapter_load(_adapter, {:embed, embed}, nil) do
load_embed(embed, nil, &load/2)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not we have an equivalent for {:parameterized, type, opts} match?

lib/ecto/type.ex Outdated
Comment on lines 880 to 888
def adapter_dump(adapter, {:parameterized, type, opts}, value) do
base_type = type.type(opts)
with {:ok, parameterized_value} <- type.dump(value, opts),
{:ok, final_value} <- adapter_dump(adapter, base_type, parameterized_value) do
{:ok, final_value}
else
:error -> :error
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have an error with that. How about just remove it as it does not replaces {:embed, _} match?

defp equal_fun({:parameterized, mod, opts}) when is_atom(mod) do
if loaded_and_exported?(mod, :equal?, 3) do
fn a, b ->
mod.equal?(a, b, opts)
end
end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are not removing any {:embed, _} match is it one necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embeds didn't previously pass through the cast function, so they never hit this code path. Now that they do, we need to be able to test the equality of the parameterized type after casting.

Comment on lines -30 to +32
]
],
line_length: 210,
inputs: ["*.{ex,exs}", "{config,lib,test}/**/*.{ex,exs}"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a part of this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be removed before merging

@@ -520,8 +518,14 @@ defmodule Ecto.Changeset do
case cast_field(key, param_key, type, params, current, empty_values, defaults, valid?) do
{:ok, value, valid?} ->
{Map.put(changes, key, value), errors, valid?}
:ignore ->
{changes, errors, valid?}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between :ignore and :missing (below)? :)

@josevalim
Copy link
Member

Hi @TheFirstAvenger!

I have spent most of the day today reviewing this. There are a bunch of places that I am really happy they have been generalized (such as the embeds handling in Ecto.Repo.Schema and the put_change logic) and other places which I am 100% sure that should not be generalized, such as validate_json_path!. Functions such as validate_json_path! is very specific to how embeds work and I doubt they would be useful for other purposes.

So after some thought, I think we won't be able to fully convert embeds to parameterized types, but there is likely a healthy amount of code that can be unified if we go down this path. Perhaps the compromise here is to mostly treat embeds as parameterized types while still keeping them their own construct. In other words, embeds still have their own {:embed, ...} type but something like put_change could perhaps be rewritten to this:

def put_change(..., {tag, relation}) when tag in @relation do
  put_change(..., {:param, Relation, relation})
end

The idea is at the high-level they are distinct but internally they are converted and treated as parameterized types whenever possible, allowing us to remove hard-coded logic and provide more extensible parameterized types. This will also allow us to migrate embeds step by step without requiring a complete rewrite of the embeds code.

What do you think? Is this a good feedback? Does it provide a positive direction? What do you think the next steps should be?

@TheFirstAvenger
Copy link
Contributor Author

@josevalim Yes, thank you, that is helpful high level feedback. There are plenty of details to figure out, but it is a good direction. I wonder if we could keep embeds as {:parameterized, Ecto.Type.Embed, opts}, but in places where we need to hard-code some logic for Embeds (such as conditionally calling validate_json_path!), we pattern match specifically the {:parameterized, Ecto.Type.Embed, opts} pattern out and do it. In this way, Embeds would still be a ParameterizedType, but Ecto would have some hard-coded logic that only applies to Embed.

I think the next step is to determine specifically what needs to be rolled back, and how to connect the logic paths appropriately. For example, I wonder if we should restore the legacy cast_embed code path, update it to recognize the :parameterized tuple instead of :embed, and then branch inside the regular cast such that if an embed is one of the casted fields, it calls into cast_embed instead of following the regular cast logic for that one field. This would maintain the legacy cast_embed functionality as is, but still allow users to migrate to using cast for embeds like they do for regular fields.

@josevalim
Copy link
Member

I wonder if we could keep embeds as {:parameterized, Ecto.Type.Embed, opts},

I believe the issue with this change is that it is backwards incompatible (for example, phoenix_ecto maches on {:embed | :assoc, _}) and it will force us to immediately change all parts of Ecto at the same time. So I would prefer to not touch their underlying definition.


On the more practical aspect, here is a suggestion for moving forward. We introduce the Ecto.ParameterizedType behaviour. On v1, it has all of the callbacks as Ecto.Type except they have an additional argument which is exactly the param. Once it is ready, we merge it in.

For v2, we will add the changes so we can remove Relation specific logic from Ecto.Changeset. In this step, I expect we will modify the cast callback, possibly add the change and apply_changes callbacks.

For v3 I have no idea... yet. :) Maybe we will move the logic in Ecto.Repo.Schema to a callback too. The important is that each of these versions will just be a separate PR, so we can iterate on it by parts now that we know what works and what does not.

WDYT?

@Eiji7
Copy link
Contributor

Eiji7 commented Aug 6, 2020

@josevalim Answering your comments:

I have spent most of the day today reviewing this. There are a bunch of places that I am really happy they have been generalized (such as the embeds handling in Ecto.Repo.Schema and the put_change logic) and other places which I am 100% sure that should not be generalized, such as validate_json_path!. Functions such as validate_json_path! is very specific to how embeds work and I doubt they would be useful for other purposes.

I generally agree with validate_json_path!. Honestly I was a bit surprised when it was added as callback. However I started to think in other way …

We want to create generic Ecto.Type, so would we have benefits over validate_json_path?
First of all json is definitely a part of embeds …
"embeds" (plural)?
Then how about without json part i.e. validate_path?
What other paths we have? Would they be useful?
What would happen if let's say PostgreSQL would add new type just like jsonb was added?
How about XPath? How about querying other non-json documents?

In that way I found that:

  1. At least 2 types (Embeds and EmbedsMany) would already support it.
  2. Other libraries may want to add support for custom paths like how now we support json

From that I think that it should be added, but as optional callback.


So after some thought, I think we won't be able to fully convert embeds to parameterized types, but there is likely a healthy amount of code that can be unified if we go down this path. Perhaps the compromise here is to mostly treat embeds as parameterized types while still keeping them their own construct. In other words, embeds still have their own {:embed, ...} type but something like put_change could perhaps be rewritten to this:

Ideally I would like to not hardcode things. I would really go that way and in future make also associations be a parameterized types. If that would be possible and new behaviour would be as generic as possible then in future me or somebody else may use this behaviour to write an amazing things which would simplify many use cases for example polymorphic associations and embeds.

If we would have enough generic behaviour then instead of creating 2 fields (like assoc_name_id and assoc_name_type) we would be able to define just one field (like assoc_name) as parameterized type which would extremely simplify it's logic.

defmodule MyApp.Comment do
  use Ecto.Schema
  use MyLib

  schema "comments" do
    belongs_to :author, MyApp.User
    exactly_one_of :source, :belongs_to do
      polym :post, MyApp.Post
    end
    # helpful macro which would generate something like:
    #  field :source, {MyLib.ExactlyOneOf, fields: [%{schema: MyApp.Post, source: :post_id, type: :post}, …]}
  end
end

Such example would allow us to query :source just like :post, but MyLib.ExactlyOneOf would simply generate an SQL case statement checking which field of [:post_id, …] is not nil and returning it with type (for example :post) without storing said type in database.


I believe the issue with this change is that it is backwards incompatible (for example, phoenix_ecto maches on {:embed | :assoc, _}) and it will force us to immediately change all parts of Ecto at the same time. So I would prefer to not touch their underlying definition.

I think same … Even if we would take a look at changeset errors they are going to change. I'm definitely for doing this, but this PR is somehow going too forward.


On the more practical aspect, here is a suggestion for moving forward. We introduce the Ecto.ParameterizedType behaviour. On v1, it has all of the callbacks as Ecto.Type except they have an additional argument which is exactly the param. Once it is ready, we merge it in.

For v2, we will add the changes so we can remove Relation specific logic from Ecto.Changeset. In this step, I expect we will modify the cast callback, possibly add the change and apply_changes callbacks.

For v3 I have no idea... yet. :) Maybe we will move the logic in Ecto.Repo.Schema to a callback too. The important is that each of these versions will just be a separate PR, so we can iterate on it by parts now that we know what works and what does not.

From what can I see v1 is going to be the only backwards-compatible and then we would fully move embeds not sooner than in v2, right?

Definitely 👍 for this - I really like this idea - doing all now looks complicated and it's better to split it into smaller places

Also I would be really interested for contributing up to v3 and I think that moving logic in Ecto.Repo.Schema to a callback would give many advantages especially in my example of usage.

@TheFirstAvenger
Copy link
Contributor Author

@josevalim That sounds like a plan. To be sure I understand, v1 would not touch :embed at all, it would simply introduce the new ParameterizedType and fully support it in the codebase for the callbacks type, cast, load, dump, equal?, and embed_as. Since we won't have Embed as a Proof of Viability in v1, could we implement enum instead?

If this sounds good, I can go ahead and close this PR, and start on a clean v1 PR.

@Eiji7
Copy link
Contributor

Eiji7 commented Aug 6, 2020

could we implement enum instead?

@TheFirstAvenger I think so, but firstly go with v1, so we would have such PRs:

  1. Parameterized v1
  2. Parameterized enum
  3. Parameterized v2 - first breaking changes and changes in tests goes here
  4. Parameterized embeds
  5. Parameterized v3
  6. Parameterized associations?

less changes == easier to review

@wojtekmach
Copy link
Member

Parameterized v2 - first breaking changes and changes in tests goes here

just to be clear, there are no plans for Ecto v4.0 and so it's a no-no on breaking changes to the public API.

@Eiji7
Copy link
Contributor

Eiji7 commented Aug 6, 2020

just to be clear, there are no plans for Ecto v4.0 and so it's a no-no on breaking changes to the public API.

Yes, I know about that, but there is no way to fully rewrite embeds to parameterized type without changing public API.

@josevalim
Copy link
Member

josevalim commented Aug 7, 2020

That sounds like a plan. To be sure I understand, v1 would not touch :embed at all, it would simply introduce the new ParameterizedType and fully support it in the codebase for the callbacks type, cast, load, dump, equal?, and embed_as. Since we won't have Embed as a Proof of Viability in v1, could we implement enum instead?

Absolutely correct. My suggestion is to implement enum in test/ for v1. The reason for so is to avoid discussing what should be the enum API at the same time we are discussing the parameterized API. :) Then we can extract it from test into an actual feature later on (but keeping the tests themselves)!

Yes, I know about that, but there is no way to fully rewrite embeds to parameterized type without changing public API.

Which is why we will have to hardcode some clauses that convert {:embed, _} into {:param, _, _} in some places for now. API breaking is non-negotiable, sorry.

One thing we could try, however, is to decouple the types in __changeset__ from the ones returned by __schema__(:type, _). It may be that we can change __schema__(:type, _) to return the parameterized type while keeping {:embed, _} in changesets. This means the API surface (schemas and changesets) knows about embeds but none of the internals (i.e. Ecto.Type) need to. This would also allow us to provide a more piece-meal migration.

@josevalim
Copy link
Member

josevalim commented Aug 7, 2020

One thing we could try, however, is to decouple the types in changeset from the ones returned by schema(:type, _). It may be that we can change schema(:type, _) to return the parameterized type while keeping {:embed, _} in changesets.

So I have just pushed a commit that does exactly this: 5996203

After the commit above, the only parts that know about {:embed, _} in Ecto are Ecto.Schema and Ecto.Changeset. :D Ecto.Repo.Schema also checks for it on the surface_changes function but this is something we were talking about generalizing eventually.

For validate_json_path, I have decided to explicitly match on {:parameterized, Ecto.Embedded, embed}. It is something we can change in the future but it is acceptable for now while we focus on more pressing issues - it is not any worse than the hardcoding we had before.

Also note that, although the commit above does extract some things to Ecto.Embedded, it does not define the Ecto.ParameterizedType API at all, so we can move ahead with v1 as usual, except that we already have one consumer of the upcoming API. :)

@TheFirstAvenger
Copy link
Contributor Author

@josevalim Sounds good! Closing this and I should have a v1 PR up with ParameterizedType (and enum in test) shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants